📊 Statistical Ranking - emschwartz · Scour

🎰Bandit Algorithms arxiv.org·

Stop the Sampler! Classifier-Based Adaptive Stopping for Sampling Kernels

Less-relevant results

💻Coding Agents joyemang33.github.io·

Humans Still Beat AI in the Long Horizon

Covers 2 stories including Implications of Large-Scale Test-Time Compute (5 minute read)

Discussed on Hacker News

🆕New AI artificialanalysis.ai·

LaunchAA-BriefcaseA new proprietary benchmark for long-horizon knowledge work

Covered by The Decoder, news.smol.ai

Discussed on Hacker News and r/LocalLLaMA

🎛️Feed Filtering Tech Xplore·

Why asking people to rank three options could sharpen AI and recommendation systems

🏆LLM Benchmarking arxiv.org·

Bayesian Inference and Decision Audits for Public Archives of Frontier AI Evaluations

🏗️LLM Infrastructure arxiv.org·

Nonlocal Bayesian Modeling of Continuous Spatio-Temporal Dynamics

🎰Bandit Algorithms arxiv.org·

UBP2: Uncertainty-Balanced Preference Planning for Efficient Preference-based Reinforcement Learning

🏆LLM Benchmarking arxiv.org·

From Drift to Coherence: Stabilizing Beliefs in LLMs

🎖Text Quality Models arxiv.org·

TuneJury: An Open Metric for Improving Music Generation Preference Alignment

🧠LLM Inference arxiv.org·

Uncertainty Estimation and Generalization Bounds for Modern Deep Learning

🎰Bandit Algorithms arxiv.org·

A Decision-Theoretic View of Test-Time Training: When, How Far, and Which Directions to Adapt

📜Economic History arxiv.org·

Surprise-Guided MergeSort: Budget-Efficient Human-in-the-Loop Ranking via Adaptive Comparison Scheduling

⚡PGO arxiv.org·

Online Convex Optimization with Sublinear Noisy Probes

📋MCP arxiv.org·

Amortized mean-shift interacting particles

🧠LLM Inference arxiv.org·

Attention-Based Estimation of the Individual Treatment Benefit Probability under Dose Variation

🏆LLM Benchmarking arxiv.org·

Limited Marginal Benefit of Reasoning-Heavy LLM Deployment in ESG Narrative Scoring: A 4-Model Consensus Study on Japanese Listed Firms

🏗️LLM Infrastructure arxiv.org·

RouteJudge: An Open Platform for Reproducible and Preference-Aware LLM Routing

✨Gemini arxiv.org·

City landscape in sight: A crowdsourced framework for unlocking urban-scale window view perceptions from real estate imagery

👑Leader Election arxiv.org·

Comparison Patrols on Drifting Orders: Certified Rank Maintenance, Evolving Planar Maxima, and Selection under Drifting Fitness

No more posts from emschwartz's subscribed feeds.

Scour all 25,324 feeds Learn more about Feeds

Log in to enable infinite scrolling